The transition to renewable energy is a critical issue for mitigating climate change and promoting sustainable development. As the world’s second-largest carbon emitter (https://www.epa.gov/ghgemissions/global-greenhouse-gas-emissions-data), the United States has set ambitious targets for reducing greenhouse gas emissions and increasing renewable energy generation. However, the factors that drive renewable energy adoption in the US are not fully understood, and there is a need for more research to identify the most effective policies and incentives for promoting clean energy.
In this project, we aim to investigate the key drivers of renewable energy adoption in the US and their relative importance. We will use a comprehensive dataset derived from public-facing websites which includes information on state-level renewable energy capacity, energy policies, economic indicators, and demographic variables.
Our research questions include: 1. How do economic indicators, such as energy prices, GDP, and oil production, influence renewable energy adoption in different regions of the US? 2. What role do demographic variables, such as IQ and political affiliation, play in shaping attitudes towards renewable energy and driving adoption?
knitr::opts_chunk$set(echo = TRUE)
#1
#Install familiar packages
library(tidyverse);library(lubridate);library(viridis);library(here)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.0 ✔ purrr 1.0.0
## ✔ tibble 3.1.8 ✔ dplyr 1.0.10
## ✔ tidyr 1.2.1 ✔ stringr 1.5.0
## ✔ readr 2.1.3 ✔ forcats 0.5.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## Loading required package: timechange
##
##
## Attaching package: 'lubridate'
##
##
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
##
##
## Loading required package: viridisLite
##
## here() starts at /home/guest/R/LiuSmootZhang_ENV872_EDA_FinalProject
#install.packages("rvest")
library(rvest)
##
## Attaching package: 'rvest'
##
## The following object is masked from 'package:readr':
##
## guess_encoding
#install.packages("dataRetrieval")
library(dataRetrieval)
#install.packages("tidycensus")
library(tidycensus)
library(dplyr)
library(rvest)
library(ggplot2)
#install.packages("corrplot")
library(corrplot)
## corrplot 0.92 loaded
library(stringr)
library(sf)
## Linking to GEOS 3.10.2, GDAL 3.4.1, PROJ 8.2.1; sf_use_s2() is TRUE
library(leaflet)
library(mapview); mapviewOptions(fgb = FALSE)
sf::sf_use_s2(FALSE)
## Spherical geometry (s2) switched off
The dataset for this analysis was collected from various sources including the National Renewable Energy Laboratory (NREL), the United States Energy Information Administration (EIA), and wisevoter.com. The dataset includes information on various factors that may drive renewable energy adoption in the US, such as state-level policies, economic indicators, demographic characteristics, and natural resources.
The dataset was first imported into R as separate data frames and then merged into a single data frame using the merge() function. Before merging, data cleaning and manipulation were performed to ensure consistency across variables and eliminate missing values. For instance, some variables were renamed, and missing values were replaced with appropriate values or removed entirely.
The final dataset contains 50 observations (one for each state) and 7 variables. The variables in the dataset are summarized in the table below: | Variable | Unit | Range / Central Tendency | Data Source | |———————————|———–|————————–|————————————-| | Good_Air_Quality_Days_Percentage| Percentage| 35.8-89.4 | wisevoter.com | | Oil_Production | Thousand barrels per day | 0-4700| wisevoter.com | | GDP | Billion dollars | 54.7-2,784.1 | wisevoter.com | | Sunshine | Hours per day | 3.7-10.3 | wisevoter.com | | Electricity_Cost | Cents per kilowatt hour | 5.6-20.5 | wisevoter.com | | Average_IQ | IQ points | 94.8-103.9 | wisevoter.com | | Republican_vote | Percentage| 23.5-70.4 | wisevoter.com |
url_aq <- "https://wisevoter.com/state-rankings/air-quality-by-state/"
page1 <- read_html(url_aq)
states <- page1 %>%
html_nodes("td:nth-child(2)") %>%
html_text()
good.air.days <- page1 %>%
html_nodes("td:nth-child(3)") %>%
html_text()
df_aq <- data.frame(
States = states,
Good_Air_Quality_Days_Percentage = good.air.days
)
url_oil <- "https://wisevoter.com/state-rankings/oil-production-by-state/"
page2 <- read_html(url_oil)
states2 <- page2 %>%
html_nodes("td:nth-child(2)") %>%
html_text()
oil.production <- page2 %>%
html_nodes("td:nth-child(3)") %>%
html_text()
df_oil <- data.frame(
States = states2,
Oil_Production = oil.production
)
url_gdp <- "https://wisevoter.com/state-rankings/gdp-by-state/"
page <- read_html(url_gdp)
states3 <- page %>%
html_nodes("td:nth-child(2)") %>%
html_text()
gdp <- page %>%
html_nodes("td:nth-child(3)") %>%
html_text()
df_gdp <- data.frame(
States = states3,
GDP = gdp
)
url_sunny <- "https://wisevoter.com/state-rankings/sunniest-states/"
page <- read_html(url_sunny)
states4 <- page %>%
html_nodes("td:nth-child(2)") %>%
html_text()
sunshine <- page %>%
html_nodes("td:nth-child(3)") %>%
html_text()
df_sunny <- data.frame(
States = states4,
Sunshine = sunshine
)
url_elec <- "https://wisevoter.com/state-rankings/electricity-cost-by-state/"
page <- read_html(url_elec)
states5 <- page %>%
html_nodes("td:nth-child(2)") %>%
html_text()
cost <- page %>%
html_nodes("td:nth-child(3)") %>%
html_text()
df_ecost <- data.frame(
States = states5,
Electricity_Cost = cost
)
url_iq <- "https://wisevoter.com/state-rankings/average-iq-by-state/"
page <- read_html(url_iq)
states6 <- page %>%
html_nodes("td:nth-child(2)") %>%
html_text()
iq <- page %>%
html_nodes("td:nth-child(3)") %>%
html_text()
df_iq <- data.frame(
States = states6,
Average_IQ = iq
)
url_party <- "https://wisevoter.com/state-rankings/red-and-blue-states"
page <- read_html(url_party)
states7 <- page %>%
html_nodes("td:nth-child(1)") %>%
html_text()
rep_vote <- page %>%
html_nodes("td:nth-child(3)") %>%
html_text()
df_party <- data.frame(
States = states7,
Republican_vote = rep_vote
)
# Merge data frames
state_rankings_df_merge <- Reduce(function(x, y) merge(x, y, by = "States", all = TRUE),
list(df_aq, df_oil, df_gdp, df_sunny, df_ecost, df_iq, df_party))
# View merged data frame
state_rankings_df_merge
## States Good_Air_Quality_Days_Percentage Oil_Production
## 1 Alabama 85.8% 425 Mbbl
## 2 Alaska 92.3% 159,623 Mbbl
## 3 Arizona 64.6% 6 Mbbl
## 4 Arkansas 83% 4,179 Mbbl
## 5 California 60.6% 130,593 Mbbl
## 6 Colorado 75.4% 144,621 Mbbl
## 7 Connecticut 83.4% <NA>
## 8 Delaware 80.3% <NA>
## 9 District of Columbia 69% <NA>
## 10 Florida 89.8% 1,492 Mbbl
## 11 Georgia 82.1% <NA>
## 12 Hawaii 99.4% <NA>
## 13 Idaho 79% 25 Mbbl
## 14 Illinois 75% 7,054 Mbbl
## 15 Indiana 80.9% 1,523 Mbbl
## 16 Iowa 78.3% <NA>
## 17 Kansas 80.8% 27,518 Mbbl
## 18 Kentucky 85.2% 2,464 Mbbl
## 19 Louisiana 87.9% 32,843 Mbbl
## 20 Maine 93.2% <NA>
## 21 Maryland 84.2% <NA>
## 22 Massachusetts 87.1% <NA>
## 23 Michigan 79.9% 4,454 Mbbl
## 24 Minnesota 83.4% <NA>
## 25 Mississippi 85.5% 1,343 Mbbl
## 26 Missouri 86.1% 50 Mbbl
## 27 Montana 80.8% 18,731 Mbbl
## 28 Nebraska 86.8% 1,593 Mbbl
## 29 Nevada 72.5% 210 Mbbl
## 30 New Hampshire 91% <NA>
## 31 New Jersey 82.3% <NA>
## 32 New Mexico 78.9% 459,792 Mbbl
## 33 New York 88.7% 178 Mbbl
## 34 North Carolina 86.3% <NA>
## 35 North Dakota 81.1% 393,763 Mbbl
## 36 Ohio 80.9% 18,147 Mbbl
## 37 Oklahoma 76.7% 143,052 Mbbl
## 38 Oregon 83% <NA>
## 39 Pennsylvania 79.2% 6,357 Mbbl
## 40 Rhode Island 86.7% <NA>
## 41 South Carolina 84.9% <NA>
## 42 South Dakota 82.1% 1,028 Mbbl
## 43 Tennessee 84.5% 206 Mbbl
## 44 Texas 80.5% 1,741,444 Mbbl
## 45 Utah 70.2% 36,317 Mbbl
## 46 Vermont 91.3% <NA>
## 47 Virginia 91.5% 3 Mbbl
## 48 Washington 88.7% <NA>
## 49 West Virginia 86.7% 18,918 Mbbl
## 50 Wisconsin 79.8% <NA>
## 51 Wyoming 78.9% 84,753 Mbbl
## GDP Sunshine Electricity_Cost Average_IQ Republican_vote
## 1 $257,465,000,000 4,660 kJ/m² $0.1 per kWh 95.7 62%
## 2 $57,983,000,000 <NA> $0.2 per kWh 99 52.8%
## 3 $429,819,000,000 5,755 kJ/m² $0.11 per kWh 97.4 49.1%
## 4 $150,483,000,000 4,724 kJ/m² $0.09 per kWh 97.5 62.4%
## 5 $3,513,348,000,000 5,050 kJ/m² $0.2 per kWh 95.5 34.3%
## 6 $440,903,000,000 4,960 kJ/m² $0.11 per kWh 101.6 41.9%
## 7 $308,671,000,000 3,988 kJ/m² $0.18 per kWh 103.1 39.2%
## 8 $84,206,000,000 4,232 kJ/m² $0.11 per kWh 100.4 39.8%
## 9 $156,457,000,000 4,307 kJ/m² $0.13 per kWh <NA> 5.4%
## 10 $1,286,086,000,000 4,859 kJ/m² $0.11 per kWh 98.4 51.2%
## 11 $713,948,000,000 4,661 kJ/m² $0.1 per kWh 98 49.2%
## 12 $94,937,000,000 <NA> $0.3 per kWh 95.6 34.3%
## 13 $98,455,000,000 4,251 kJ/m² $0.08 per kWh 101.4 63.8%
## 14 $973,486,000,000 4,380 kJ/m² $0.1 per kWh 99.9 40.6%
## 15 $438,012,000,000 4,318 kJ/m² $0.1 per kWh 101.7 57%
## 16 $225,696,000,000 4,331 kJ/m² $0.09 per kWh 103.2 53.1%
## 17 $198,291,000,000 4,890 kJ/m² $0.11 per kWh 102.8 56.2%
## 18 $244,480,000,000 4,383 kJ/m² $0.09 per kWh 99.4 62.1%
## 19 $267,126,000,000 4,725 kJ/m² $0.09 per kWh 95.3 58.5%
## 20 $79,177,000,000 3,815 kJ/m² $0.14 per kWh 103.4 44%
## 21 $451,986,000,000 4,267 kJ/m² $0.12 per kWh 99.7 32.2%
## 22 $663,750,000,000 3,944 kJ/m² $0.19 per kWh 104.3 32.1%
## 23 $592,349,000,000 4,018 kJ/m² $0.13 per kWh 100.5 47.8%
## 24 $429,391,000,000 3,968 kJ/m² $0.11 per kWh 103.7 45.3%
## 25 $129,974,000,000 4,693 kJ/m² $0.1 per kWh 94.2 57.6%
## 26 $373,105,000,000 4,545 kJ/m² $0.1 per kWh 101 56.8%
## 27 $61,983,000,000 3,847 kJ/m² $0.1 per kWh 103.4 56.9%
## 28 $154,149,000,000 4,685 kJ/m² $0.09 per kWh 102.3 58.2%
## 29 $204,306,000,000 5,296 kJ/m² $0.09 per kWh 96.5 47.7%
## 30 $102,439,000,000 3,891 kJ/m² $0.17 per kWh 104.2 45.4%
## 31 $700,119,000,000 4,056 kJ/m² $0.14 per kWh 102.8 41.4%
## 32 $114,680,000,000 5,642 kJ/m² $0.1 per kWh 95.7 43.5%
## 33 $1,914,207,000,000 3,904 kJ/m² $0.16 per kWh 100.7 37.8%
## 34 $684,607,000,000 4,466 kJ/m² $0.09 per kWh 100.2 49.9%
## 35 $66,372,000,000 3,925 kJ/m² $0.09 per kWh 103.8 65.1%
## 36 $765,003,000,000 4,139 kJ/m² $0.1 per kWh 101.8 53.3%
## 37 $218,564,000,000 4,912 kJ/m² $0.09 per kWh 99.3 65.4%
## 38 $279,424,000,000 3,830 kJ/m² $0.09 per kWh 101.2 40.4%
## 39 $874,881,000,000 3,939 kJ/m² $0.1 per kWh 101.5 48.8%
## 40 $68,823,000,000 3,989 kJ/m² $0.18 per kWh 99.5 38.6%
## 41 $281,754,000,000 4,624 kJ/m² $0.1 per kWh 98.4 55.1%
## 42 $62,809,000,000 4,332 kJ/m² $0.1 per kWh 102.8 61.8%
## 43 $439,050,000,000 4,486 kJ/m² $0.1 per kWh 97.7 60.7%
## 44 $2,104,579,000,000 5,137 kJ/m² $0.09 per kWh 100 52.1%
## 45 $230,342,000,000 4,887 kJ/m² $0.08 per kWh 101.1 58.1%
## 46 $37,644,000,000 3,826 kJ/m² $0.16 per kWh 103.8 30.7%
## 47 $614,754,000,000 4,354 kJ/m² $0.09 per kWh 101.9 44%
## 48 $696,748,000,000 3,467 kJ/m² $0.09 per kWh 101.9 38.8%
## 49 $91,953,000,000 4,146 kJ/m² $0.09 per kWh 98.7 68.6%
## 50 $379,909,000,000 4,023 kJ/m² $0.11 per kWh 102.9 48.8%
## 51 $44,323,000,000 4,471 kJ/m² $0.08 per kWh 102.4 69.9%
write.csv(state_rankings_df_merge, file = "state rankings_raw.csv",
row.names = FALSE)
Next, we clean the data scraped from various websites before merging it with the power plant data. The dataframe we have right now consisted of columns made of chars. After removing the units (such as per kWh) and characters (such as comma and $), we converted the chars to numeric values and created additional columns as needed based on existing columns.
# read in raw data
state_ranking <- read.csv('state rankings_raw.csv', header = TRUE)
# save a copy of raw data before making changes
state_ranking_raw <- state_ranking
# remove % in good_air_q col
state_ranking$Good_Air_Quality_Days_Percentage <- str_sub(state_ranking$Good_Air_Quality_Days_Percentage, end = -2)
# change chr to numeric
state_ranking$Good_Air_Quality_Days_Percentage <- as.numeric(state_ranking$Good_Air_Quality_Days_Percentage)
# remove unit in oil_production
state_ranking$Oil_Production <- str_sub(state_ranking$Oil_Production, end = -5)
# remove comma
state_ranking$Oil_Production <- gsub(',','', state_ranking$Oil_Production)
# change chr to numeric
state_ranking$Oil_Production <- as.numeric(state_ranking$Oil_Production)
# remove $ in GDP
state_ranking$GDP <- str_sub(state_ranking$GDP, 2)
# remove comma
state_ranking$GDP <- gsub(',','', state_ranking$GDP)
# change to numeric
state_ranking$GDP <- as.numeric(state_ranking$GDP)
# remove unit in sunshine
state_ranking$Sunshine <- str_sub(state_ranking$Sunshine, end=-6)
# remove comma
state_ranking$Sunshine <- gsub(',','', state_ranking$Sunshine)
# change to numeric
state_ranking$Sunshine <- as.numeric(state_ranking$Sunshine)
# clean electricity_cost
state_ranking$Electricity_Cost <- str_sub(state_ranking$Electricity_Cost, end=-8)
state_ranking$Electricity_Cost <- str_sub(state_ranking$Electricity_Cost, 2)
state_ranking$Electricity_Cost <- as.numeric(state_ranking$Electricity_Cost)
# clean average
state_ranking$Average_IQ <- as.numeric(state_ranking$Average_IQ)
# clean republican_vote
state_ranking$Republican_vote <- str_sub(state_ranking$Republican_vote, end=-2)
state_ranking$Republican_vote <- as.numeric(state_ranking$Republican_vote)
# create new column: party affiliation
state_ranking$party <- with(state_ranking, ifelse(Republican_vote > 50, "Red", "Blue"))
For the response variable, clean energy adoption rate, data wrangling and calculation are needed. We first transform all NAs to 0 so that we can later aggregate the capacity for each energy source. The next step is to calculate the total clean energy capacity and total energy generation capacity for each power plant. After that, we aggregate all the power plants by states and calculate the clean energy percentage for each state.
Power_plant <- read.csv(here("data_raw/Power_Plants.csv"), stringsAsFactors = TRUE)
Power_plant[is.na(Power_plant)] <- 0
## Warning in `[<-.factor`(`*tmp*`, thisvar, value = 0): invalid factor level, NA
## generated
## Warning in `[<-.factor`(`*tmp*`, thisvar, value = 0): invalid factor level, NA
## generated
Power_plant <- Power_plant %>%
mutate(Total_clean = Bio_MW + Geo_MW + Hydro_MW + HydroPS_MW + Nuclear_MW + Solar_MW + Wind_MW) %>%
mutate(Total = Total_clean + Coal_MW + NG_MW + Crude_MW)
Clean_energy <- aggregate(cbind(Power_plant$Total_clean, Power_plant$Total), list(Power_plant$State), FUN = sum)
colnames(Clean_energy) <- c("States", "Clean_energy_MW", "Total_MW")
Clean_energy$Clean_percent <- Clean_energy$Clean_energy_MW/Clean_energy$Total_MW
We merge the two clean datasets as one final dataset that is ready for analysis.
final_df <- merge(Clean_energy, state_ranking)
We plot correlation plots to explore the covariance among response and explanatory variables.
final_naomit <- final_df %>%
select(Clean_percent:Republican_vote) %>%
na.omit()
final_corr <- cor(final_naomit)
corrplot.mixed(final_corr, upper = "ellipse", tl.cex = 0.6)
We want to take a deeper dive into the distribution of each variables by histograms. Based on the histograms, oil production, GDP and electricity cost are skewed, which indicates that transformation is needed for those variables.
par(mfrow=c(2,4))
hist(final_naomit$Clean_percent, xlab = "Clean Energy Adoption %", main = NULL)
hist(final_naomit$Good_Air_Quality_Days_Percentage, xlab = "Good Air Quality Days %", main = NULL)
hist(final_naomit$Oil_Production, xlab = "Good Air Quality Days %", main = NULL)
hist(final_naomit$GDP, xlab = "GDP", main = NULL)
hist(final_naomit$Sunshine, xlab = "Sunshine", main = NULL)
hist(final_naomit$Electricity_Cost, xlab = "Electricity Cost", main = NULL)
hist(final_naomit$Average_IQ, xlab = "Average IQ", main = NULL)
hist(final_naomit$Republican_vote, xlab = "Republican Vote", main = NULL)
Next, we visualize our power plants data by making an interactive map widget with the leaflet package. We first categorized the power plant as “clean energy” or “other” based on the “primary source” attribute - we considered biomass, geothermal, hydroelectric, pumped storage, nuclear, solar, and wind to be clean. For the map, we plotted clean energy power plants as blue and other as red; the size of the markers was scaled to indicate the maximum output and was expressed in megawatts in original data. You can pan around and zoom in and out to explore the dataset.
# read in the shapefile
powerp <- st_read('../LiuSmootZhang_ENV872_EDA_FinalProject/data_raw/Power_Plants.shp')
## Reading layer `Power_Plants' from data source
## `/home/guest/R/LiuSmootZhang_ENV872_EDA_FinalProject/data_raw/Power_Plants.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 11569 features and 32 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: -171.7124 ymin: 17.94712 xmax: -65.27956 ymax: 71.292
## Geodetic CRS: WGS 84
powerp$PrimSource <- as.factor(powerp$PrimSource)
unique(powerp$PrimSource)
## [1] petroleum hydroelectric natural gas nuclear coal
## [6] pumped storage geothermal wind biomass batteries
## [11] other solar
## 12 Levels: batteries biomass coal geothermal hydroelectric ... wind
powerp$PrimSource <- str_trim(powerp$PrimSource, 'both')
# clean energy: biomass, geothermal, hydroelectric, pumped, nuclear, solar, wind
# create new column: clean energy
powerp$clean <- with(powerp,
ifelse(PrimSource=='biomass'|PrimSource=='geothermal'|PrimSource=='hydroelectric'|PrimSource=='pumpedstorage'|PrimSource=='nuclear'|PrimSource=='solar'|PrimSource=='wind', 'clean energy', 'others'))
powerp$clean <- as.factor(powerp$clean)
# colors for the map widget
pal <- colorFactor(c('skyblue','tomato'), powerp$clean)
# make the map widget
leaflet(powerp) %>%
addProviderTiles(providers$CartoDB.Positron)%>%
addCircleMarkers(
radius = ~sqrt(Total_MW)/5,
color = ~pal(clean),
stroke = FALSE, fillOpacity = 0.5
) %>%
addLegend("bottomright", pal = pal, values = ~clean,
title = "Power plants",
opacity = 1
)
We then mapped the clean energy percentage in each state. Thanks to the help from https://rstudio.github.io/leaflet/choropleths.html, we were able to make this interactive map widge. You can pan around, zoom, and hover your mouse over each state to see its clean enegery percentage and state name in a popup window. The state boundaries data came from https://www.sciencebase.gov/catalog/item/52c78623e4b060b9ebca5be5.
# read in the shapefile
states_shp <- st_read('../LiuSmootZhang_ENV872_EDA_FinalProject/data_raw/tl_2012_us_state.shp')
## Reading layer `tl_2012_us_state' from data source
## `/home/guest/R/LiuSmootZhang_ENV872_EDA_FinalProject/data_raw/tl_2012_us_state.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 56 features and 17 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -19951910 ymin: -1643352 xmax: 20021890 ymax: 11554790
## Projected CRS: Popular Visualisation CRS / Mercator
states_shp <- st_transform(states_shp, c=4326)
# join the dataframe with clean energy percentage by state to the state shapefile
join_states <- full_join(states_shp, final_df, by = c('NAME' = 'States'))
# visualize on map
bins <- c(NA, 0, 0.20, 0.40, 0.60, 0.80, 1.00)
pal2 <- colorBin("Greens", domain = join_states$Clean_percent, bins = bins)
labels <- sprintf(
"<strong>%s</strong><br/>%g ",
join_states$NAME, join_states$Clean_percent) %>% lapply(htmltools::HTML)
leaflet(join_states) %>%
setView(-96, 37.8, 3) %>%
addPolygons(
fillColor = ~pal2(Clean_percent),
weight = 2,
opacity = 1,
color = "white",
dashArray = "3",
fillOpacity = 0.7,
highlightOptions = highlightOptions(
weight = 5,
color = "#666",
dashArray = "",
fillOpacity = 0.7,
bringToFront = TRUE),
label = labels,
labelOptions = labelOptions(
style = list("font-weight" = "normal", padding = "3px 8px"),
textsize = "15px",
direction = "auto")) %>%
addLegend(pal = pal2, values = ~Clean_percent, opacity = 0.7, title = NULL,
position = "bottomright")